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. Abstract 

We use activity networks (task graphs) to model parallel programs and consider series-parallel 
' extensions of these networks. Our motivation is two-fold: the benefits of series-parallel activity 

, networks and the modelling of programming constructs, such as those imposed by current parallel 

Q computing environments. Series-parallelisation adds precedence constraints to an activity network, 
' usually increasing its makespan (execution time). The slowdown ratio describes how additional 

t/3 , constraints affect the makespan. We disprove an existing conjecture positing a bound of two on 

, ^ / the slowdown when workload is not considered. Where workload is known, we conjecture that 4/3 

slowdown is always achievable, and prove our conjecture for small networks using max-plus algebra. 
, We analyse a polynomial-time algorithm showing that achieving 4/3 slowdown is in exp-APX. Finally, 

^ ' we discuss the implications of our results. 

(N : 

^ ■ 1 Introduction 

; 

■r^lj- ■ An approach to reducing the execution time of a computer program is to run it on multiple processors 

' simultaneously. The study of parallel programming and architectures has seen a resurgence with the 

widespread adoption of multi-core processing units in computing systems. Commercial numerical software 
such as MATLAB^ and Mathematical can now take advantage of multiple processors, and OpenCL is a 
recently finalised standard for programming with multiple-processor systems [15]. 

An important aspect of parallel programming is scheduling, the method by which code is allocated to 
processors [12]. Here we instead consider the inherent precedence constraints of a parallel program and 
d [ the constraints imposed by tranformation and by the programming constructs that are used to describe 

parallelism, both of which affect execution time. Our concerns are orthogonal to scheduling since we 
assume sufficient processors and hence the decision on what to schedule next is unimportant. 

A program can be divided up into activities or tasks. This can be done in different ways depending on 
the granularity used. Here we do not consider granularity further but assume some reasonable approach 
has been used. The activities can be related to each other by the order in which they must occur for 
the program to work correctly. For instance, if one activity modifies a variable and another activity uses 
this modified value, then the modifying activity must occur before the activity uses the new value. An 
activity that must occur before another precedes the other activity and there is a precedence constraint 
between the two activities. Precedence is imposed by the structure of the program and is inherent to the 
particular set of activities. 



^Via the MATLAB Parallel Computing Toolbox, http://www.mathworks.com/ 
^Prom version 7. http : //www. wolf ram. co .uk/ 
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(a) Neighbour synchronisation example 



(b) N network 



Figure 1: Activity networks 

The formalism used to describe precedences between activities is known as an activity network, network 
or task graph. We use the activity- on-node variant of this model, where weights are associated with the 
vertices of the network. These and variants such as PERT networks or machine schedules (sometimes 
with edge instead of node weights) are widely used in fields such as project management, operational 
research, combinatorial optimization and computer science. 

Activity networks can be classified by their structure. Structures of interest are series-parallel (SP), 
and level-constrained (LC) [13] which are a proper subset of SP and a subset of of Bulk Synchronous 
Programming (BSP) which has been used successfully as an approach to parallel programming [1, 18]. 
Analysis of activity networks is difficult but is easier for SP [4]. For instance, scheduling is NP-hard 
but polynomial-time for SP networks [6] . We call the addition of constraints to achieve an SP activity 
network series-parallelisation (SP) [7]. 

Programming constructs can also impose an SP structure over and above the inherent constraints. 
The most obvious is the sequencing of commands in a sequential programming language but the addition 
of constraints can also occur with parallel constructs as we show in the motivating example in Section 2. 

The precedence constraints between activities determine the minimum time to execute the program. 
Assuming a sufficient number of processors and non-preemptive, work-conserving scheduling the fastest 
time for execution will be the time taken to execute slowest chains of activities, called critical paths. 
Chains consist of activities that are totally ordered and hence must proceed one after another, excluding 
the possibility of parallelism. 

This paper considers the difference in execution time between activity networks, comparing a network 
with only inherent precedence constraints with the same network with added precedence constraints 
that make it an SP structure. Adding constraints results in programs that take at least as long and 
we consider the slowdown where slowdown is the ratio of the slower program to the faster one. We 
characterise the slowdown induced by LC and disprove an existing conjecture about slowdown for SP 
[19]. This requires demonstrating that large slowdown can occur for every possible series-parallclisation 
of a specific network. A new conjecture is presented, and results proved for small instances. Additionally 
we discuss the complexity of finding the optimal SP for a network. First we present a motivating example, 
followed by background and definitions of the relevant structures after which come the main results and 
conjecture. We finish with the implications of our results and further research. 

2 Motivating example 

We next consider a simple example involving computations dependent on earlier computations. In a 
1-dimensional flow model of heat diffusion in a metal rod, we calculate the temperature at m points for 
each time step. The temperature at time t -I- 1 at point pi is dependent on the temperature at time r at 
points Pi-i, Pi and Pi+i. If we view each calculation as an activity a^.r, this is an example of neighbour 
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synchronisation (NS) as illustrated in Figure 1(a) when considering the solid lines only. This network is 
not SP because of the edges (ai, 1,02,1), (01,1,02,2) and (01,3,02,2) and the lack of the edge (01,3,02,1). 
This is an example of the smallest non-SP activity network, the N network shown in Figure 1(b). There 
are many instances of N in the example activity network. 

An obvious (although not necessarily the best) way to series-parallelise this activity network is to 
require all activities at time r to precede those at time r + 1. The dashed lines in Figure 1(a) illustrate 
the added precedence constraints. The edge (01,3,02,1) is added as well as edges to remove the other N 
networks. Figure 1(a) is an example of a level-constrained (LC) extension. 

Assume unit workloads for all activities apart from one much slower activity at each time instance t 
with duration t(or.2T-i) = C ^ 1- Hence for every calculation of a specific point over time, there is only 
one large workload. The execution time for the above series-parallelisation will be (C — l)(m -I- l)/2 -I- s 
where s > n is the total number of timesteps. This gives large slowdown since the execution time 
considering only inherent constraints is C + s — 1 . 

There may be better ways to series-parallelise this network, however a language such as MATLAB 
may impose a particular SP activity network through its programming constructs. If one expresses this 
example as parallel code using the par for statement (in the obvious simple way) then one will achieve 
the SP network given in Figure 1(a). 

An understanding of the slowdown obtained by various forms of series-parallelisation is therefore 
important, particularly due to the increased usage of parallel programming constructs to take advantage 
of multi-core processors. 

3 Background 

This section defines notation and basic concepts for activity-on-node networks. 

Definition 1. An activity-on-node network (task graph, activity network, or simply, network) consists 
of 

• V = {oi, . . . , o„} a set 0/ activities, 

• G = (y, E) a directed acyclic graph with precedence constraints E C V x V , 

• t : V ^ {0, 00) a workload assigning a duration to each activity. 

A precedence constraint (o, b) captures the idea that activity o must complete before activity b can 
begin. We assume that we are working with the transitive closure of the precedence constraints, namely 
that the precedence relation is irreflexive and transitive. However, when drawing activity networks, we 
only draw the edges that appear in the transitive reduction of the network. 

The makespan of an activity network G, denoted T(G), is the time to complete all activities of the 
network. This depends on the scheduling policy and the number and configuration of processors. We 
make the following assumptions. 

Scheduling: We assume non-preemptive scheduling, namely once an activity is assigned to a pro- 
cessor, it will complete on that processor without interruption; and a work-conserving scheduling policy, 
namely no processor is left idle if there are still activities waiting to start. 

Number and type of processors: The processors are identical and there are sufficiently many, 
in the sense that any activity that is ready to execute can be started. It is sufficient to have as many 
processors available as the width of the activity network. 

Overheads: All overheads such as communication, contention and decisions about which activity to 
execute next are included in the workload. 

Given these assumptions, we can characterise the makespan of activity networks. 
Definition 2. Let G = [V, E) and G' = {V',E') be directed graphs. 

• G is a subgraph 0/ G' , G C G' if V C V' and E C E' . 
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• If G is a subgraph of G' then G' is a supergraph ofG. 

• G, a subgraph of G' , is an antichain if E is empty. 

• G, a subgraph of G' , is a chain if E is a total order over V . 

• G' , a supergraph of G, is an extension if E ^ E' and V — V' . 

An extension formaUy defines what it means to add precedence constraints and does not permit 
addition of activities so t remains unchanged. A subnetwork has the obvious meaning. 

Definition 3. Let G — (V, E) be an activity network. 

• depth{G) — max{|C| | C a chain in G}. 

• width{G) ~ max{|yl| | A an antichain in G}. 

A chain represents its activities occuring one after the other, and hence the time taken for a chain to 
execute is the sum of the durations for each activity. 

Proposition 1. The makespan of a chain C = (V, E) with V — {ai, . . . , a„} is 
T{G) ^EtitM- 

The makespan of an activity network can be characterised as the time it takes to complete a chain 
in the network with the longest completion time (a critical path). The proof is straightforward, and 
makes essential use of the work-conserving property of the scheduling policy, and the fact that there 
are sufficient processors. If the number of processors is insufficient, a work-conserving approach may be 
sub-optimal [11]. 

Proposition 2. The makespan of an activity network G — (V, E) is 
T{G) ~ max{T'(C) \ G is a chain in G}. 

When we create extensions by adding constraints to obtain a specific network structure, we cannot 
decrease the time that the activity network will take to complete [10, 16]. We can define the ratio between 
the two makespans as a slowdown'^. 

Definition 4. Let H be an extension of G then the slowdown is T{H)/T{G). 

4 Structure of activity networks 

We need to define what it means for a activity network to be series-parallel. Figure 2(a) is not SP and 
Figure 2(b) is SP. The N network in Figure 1(b) is also not SP. An activity network is SP if it consists 
of a single activity or can be recursively decomposed into chains and antichains using series and parallel 
composition. 

Definition 5. An activity network G — {V, E) is series-parallel (SP) if G can be expressed using the SP 
grammar g ::— {g ® g) \ g ■ g \ a, where a is an activity, and each activity appears at most once. A string 
generated by the SP grammar is an SP expression. 

We also use juxtaposition G1G2 for Gi ■ G2. The network in Figure 2(b) can be expressed as (a(((6© 
c)(d© e)/) © {gh{i®j){k © I © to))) © nop). 

Definition 6. Let Gi = {Vi,Ei) and G2 — (^2,-^2) be activity networks with VinV2 = and £'ini?2 = 0- 

•^If we were comparing a sequential program with its parallel version, we would consider speedup, namely the ratio of the 
faster to the slower. Since we know that the program with additional precedence constraints will take at least as long as 
the original, we consider slowdown, the ratio of the slower to the faster. 
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(a) Non-series-parallel network (b) Series-parallel network 

Figure 2: Series-par allelisation of an activity network 

• The parallel composition of Gi and G2 is Gi © G2 = (Vi UV2, Ei U £'2). 

• The series composition of Gi and G2 is Gi ■ G2 — (Vi U V2, (Vi x V2) U £'1 U i?2)- 

SP networks are exactly those that do not contain the N network [17]. If we have a network that is 
not SP, we can add constraints until it is SP. An SP extension of an activity network always exists since 
if we add sufficient constraints we obtain a chain, which is SP [5]. The activity network in Figure 2(b) 
is a series-parallelisation of the activity network in Figure 2(a). We can easily calculate the makespan of 
an SP network. 

Proposition 3. Let G = {V, E) be an SP activity network with SP expression g. The makespan of G is 
T{G) = T{g) where 

n(5i®.92)) = max{r(5i),n92)}, T{g^ ■ 92) = T{gi) + T{g2), T{a)^t{a). 

This links the SP grammar with the max-plus algebra [3]. For convenience, the symbol ® will denote 
max and the symbol • will denote arithmetic +. 

Level-constrained networks are a strict subset of SP. The level of an activity a is the size of a maximal 
chain in the network which has a as its last activity. 

Definition 7. For an activity network G = {V^E), the level of an activity a is 
A(a) ~ max{|C| \ C is a chain in G, a (£ G , and for all b £ G, (b, a) G E}. 

The level of each activity in a network can be computed in polynomial time, by marking activities 
in a breadth-first search of the network's transitive reduction. The depth of an activity network is the 
maximum level of its activities. We can now add precedence constraints to obtain an extension of the 
network that maintains its level structure. This is a common technique [13, 16]. 

Definition 8. For an activity network G ~ {V,E), the level-constrained (LC) extension of G is the 
network Gl = {V,El), where El = {{a,b) \ \{a) < X{b)}. 

Note that Gl is an extension of G, and that depth{G) = depth{GL)- We can identify a level as 
Ai — {a £ V \ A(a) = z}; each level is an antichain and the levels partition the activities. Gl is also in 
BSP form [18] since each level consists of independent chains (of size one, in this case) and all activities 
in one level must complete before any activity in the next level can start. LC networks have the form 
ai. . .ad where = {at^i © ... © a^^mj. 

We consider a structure that is non-SP for sufficiently large networks. 

Definition 9. A neighbour synchronisation (NS) network of depth d, width w, and degree A, denoted 
ns{d,w,A), consists of activities Oij with i G {l,...,d}. j G {l,...,w}, and precedence constraints 
(a,j , a,+i j+fc) for every k = -[{A -'l)/2j , - [(A - 1)/2J + 1, . . . , [(A - l)/2] (as long asl <j+k <w). 

Figure 1(a) depicts an NS network of depth t, width m, and degree 3. The dashed precedence 
constraints are those added by the process of LC extension. 
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5 Bounding LC slowdown 



There are three reasons for considering LC networks. First, they relate to BSP, a useful applied technique 
for parallel programming, and second, they are efficient to construct for any given activity network. Last, 
it is straightforward to construct an upper-bound on the slowdown for a given workload t. 

Theorem 1. Given an activity network G — {V,E) and its LG extension Gl = [V^El], the slowdown 
is bounded by the ratio p of the largest to the smallest duration in the workload. 

T{Gl) max{t(a) \aeV} _ 
T{G) - min{t(a) \a&V} ^ ^' 

Proof. Given an LC extension Gl = {V,El) of G, its makespan is T{Gl) — X^S*'*^'^^ ™^'^{*(") I ^ 
Ai} < depthiG) . T[\ayi{t{a) \ a G V} since a critical path is determined by the slowest activity at each 
level and is bounded by the depth times the largest duration. Also depth{G) . nim{t{a) \ a G V} < T{G) 
since the depth of G is the size of the longest chain and the time taken for each activity in this chain is at 
least as long as the activity with the shortest duration. The result follows from these two inequalities. □ 

By Theorem 1, if all activities have similar durations, then the slowdown will be close to one. If we 
know in advance that p is small, then it is reasonable to series-parallelise using an LC extension. This is 
efficient to obtain, and BSP is then also an appropriate model for the computation, since any BSP can 
be transformed to LC by treating independent chains as single activities. 

Conversely, if p is large, its importance depends on how tight it is. If it is tight, and we know that large 
values may occur because an activity could be delayed (for instance, due to a cache miss, or swapping 
to and from disk, or because of competition for resources), then the large value of p indicates that a LC 
extension is a poor choice for series-parallelization. 

By considering ns{d, w, 3) with w > 2d — 1, we can demonstrate that slowdown for the LC extension 
can be arbitrarily close to p. 

Proposition 4. For any e > 0, there exists an NS activity network G and a workload t such that 
p-T{GL)lT{G)<e. 

However, p can be pessimistic: consider ns(l, w, 3) with one large activity and many small ones. This 
is already SP, yet p can be made arbitrarily large. 

The next section presents two conjectures about bounds for general series-parallelisations of activity 
networks. 



6 Bounding SP slowdown 

This section considers a conjecture by van Gcmund [19]. We need to introduce a parameterised notation 
for makespan. Denote the makespan by T(G, t) to indicate specifically the role of the workload function 
t. There are two different classes of algorithms that can be used to obtain a series-parallelisation. We use 
the notation S{G, t) to denote the SP network that is the output of some algorithm that considers both 
the graph and the workload, and S'{G) to denote the SP network that is the output of some algorithm 
that considers only the graph. Using this notation we can posit two distinct hypotheses: 

Workload-independent: 3k VG 35' Vt [T{S' {G),t) /T{G,t) < k] 

Workload-dependent: 3k VG 35 [T{S{G,t),t)lT{G,t) < k] 

These can be understood as follows. The first states that for every graph, there is a series-parallelisation 
with a slowdown bound of k that works for every possible workload on that graph and the second states 
that given a graph and a workload, there is a series-parallelisation with slowdown bound of k. 

Van Gemund [19] conjectures that k = 2 is a bound for slowdown for the workload-independent case. 
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Conjecture 1 ([19]). For any activity network G — {V,E), it is possible to find a SP extension Gsp of 
G, such that for every workload t:V ^ (0, oo), 

nGsp,t) ^ 

T{G,t) - ■ 

There is an algorithm that meets this bound under "reasonable" workloads [19]. The following result 
disproves Conjecture 1. 

Theorem 2. For any series-parallelisation of Q — rts(3, 8, 3), there exists a workload leading to slowdown 
greater than 2. 

We need some lemmas for the proof. 

Lemma 1. Any SP extension of a weakly connected network G will have an SP expression of the form 
a(3, where both a and (3 are SP expressions. 

Proof. An SP expression a ® P has no constraints between activities in a and in /3, so the network is 
disconnected. The result follows by contradiction. □ 

Lemma 2. Suppose an NS network G has SP expression a(3 with A odd. 

1. If is in a then ak,i is also, whenever (A — l)(i — k)/2 > \j — l\. 

2. If is in (3 then ak.i is also, whenever (A — l)(fc — i)/2 > |j/ — Z|. 

Proof. Suppose Oij is in a. By the definition of NS networks, (A — l)(i — k)/2 > \j — l\ means that 
ak,i precedes Uij. If ak,i were in /3 then aij would precede akj, which is impossible. The second part is 
symmetric. □ 

For a network G and an SP expression a, let G\a denote the subnetwork of G consisting of only those 
activities that appear in a. 

Lemma 3. Suppose d > S and w > 3. Any SP extension of ns(d, w, 3) will have an SP expression of the 
form a(3, where either G\a or G\f} is not SP. 

Proof. We argue a contradiction for ns(3, 3,3); the result follows for larger w and d by considering any 
subnetwork isomorphic to ns(3, 3, 3) which is not completely contained in either a or (3. Suppose a and 
13 are both SP. Suppose activity 02.1 and a2.3, are both in a without loss of generality. By Lemma 2, ai.i 
and ai_2 are then both in a or both in (3. However, {ai_i, ai_2, 02,1, 02,3} forms an N network in G, so G\a 
cannot be SP. Now suppose activity 02,1 is in a and 02,3 is in (3 (the opposite arrangement is symmetric). 
If 02,2 is in a then {ai.i, ai_3, a2,i, 02,2} forms an N network in G; if a2,2 is in [3 then {02,2, 12,37 Q^3,ii 03.3} 
forms an N network in G. Hence at least one of G|q or G|/3 is not SP. □ 

The depth of 3 in Lemma 3 is necessary, as any NS network of depth 2 can be made SP by enforcing 
level 1 to precede level 2, and each level is an SP network. Further, any width 2 NS network is SP, so 
the width of 3 is also necessary. 

Proof (of Theorem 2). We show that in any SP extension Q' of Q, there must exist three activities a, 6, c 
which form an antichain in Q but a chain in Q', and then construct a suitable workload using this chain. 
Possible arrangements of a, 6, c are illustrated. 



By Lemma 1, any SP extension Q' has an SP expression as a[3. Now by Lemma 3, at least one of a 
or /3 is not SP. Moreover, the subnetwork of just the last three columns is isomorphic to ns(3, 3, 3), so its 
activities that are in either a or (3 must form a non-SP subnetwork. Without loss of generality, suppose 
this is /3 (in the degenerate case there may then be no activities in a from the last three columns) . 

Now 13,6) (is, 7) 03,8 must all be in (3 by Lemma 2, by a similar argument to that in the proof of 
Lemma 3. There are now two possibilities. 

The first is that at least one of ai.i, ai.2, 01,3 appears in a. In this case, denote this activity by a. 
Further, at least two of 02,6, 02,7, 02,8 must be in /3, and these two together with two of a^fi, 03^7, 03.8 then 
forms an N subnetwork Qn of Q. Note that in Q, a does not precede any of the activities of Qn- 

The second possibility is that ai^i, ai^2, 0-1,3 are all in (3. Then by Lemma 2, a2.i and 02,2 are both in 
(3 as well, when ai.i, ai.3, 02,1, 02,2 forms an N subnetwork Qn of Q. In this case, consider the activities 
{ai^4, ai^5, ai^gj ai,7j (ii.s}- At least one of these must be in a, by Lemma 2 and since a is non-empty. 
Denote this activity by a. In Q, a does not precede any of the activities of Qn- 

In either case, in Q' there must be two activities h and c of Qn which form an antichain in Q but a 
chain in Q' . Now a and b forms an antichain in Q but a precedes 6 in Q', and the same observation holds 
for a and c. Hence {a, 6, c} forms an antichain in Q but a chain in Q' . 

Let T{a) = T{b) = T{c) = 1 and T{x) = e for every other activity x. The slowdown of Q' is then at 
least 3/(1 + 2e), which can be made arbitrarily close to 3. In particular, if e = 1/10 then the slowdown 
is at least 5/2. □ 

We next state a workload-dependent conjecture, and provide evidence for it. 



7 New conjecture 



Conjecture 2. For any activity network G — (V, E) and workload t : V 
extension Gsp of G, such that 

TiGsP,t) ^ 4 



(0,00), there exists an SP 



T{G,t) 



< 



We now need to consider the evidence to support this conjecture. At least four activities are required to 
represent a non-SP network, and the only non-SP network on four activities is the N network given in 
Figure 1(b). We start by proving the result for the case of four activities. 



Theorem 3. 

extension Ggp 



Let G^ be an activity network with four activities and workload t, then there exists an SP 
ofG* such that T{G%p,t)/T{G^,t) < 4/3. 



Proof. All networks with four activities except the N network are SP, for which T{Ggp,t)/T(G'^,t) = 
1 < 4/3. In the case of = N, label the activities of N so that it has edges (a, c), (a, d) and (b,d). 
There are then three minimal SP extensions (in the sense that every other SP extension contains one of 
these as a subnetwork): 

(K): (a,c), (a,d), {b,d), (a, 6) 

(X): (a,c), (a,d), (fe,d), (6, c) 

(V): (a,c), (a,d), (6,d), (c,d). 

Denote t{x) by x for each x G {a, 6, c, d}. A quantity such as 3{x + y) can be written xyxyxy or using 
commutativity, just xxxyyy. Also, if x < ?/ and x < z then the conclusion x < max{j;, z} can instead be 
written a,s x < y (B z. Now T(N) = max{ac, ad, bd} = ac® ad® bd, T{K) = maxjac, abd} = ac © abd, 
T{X) = maxjac, ad, be, bd} = ac(Bad®bc® bd, and T{V) — maxjacd, bd} ~ acd © bd. 

The slowdown is always at least 1, so suppose it is greater than 1 (if it is equal to 1 then the theorem is 
true). Then each of T{K), T{X), and T{V) must exceed r(N). Now if acd < bd then T(y) = bd < T(N), 
a contradiction, so acd > bd, and hence ac > b. If abd < ac then T{K) — ac < T(N), a contradiction. 
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so abd > ac, and hence bd > c. li b < a then T{X) = ac (B ad < T(N), a contradiction, so b > a. If 
c < d then T{X) — ad ® bd < T(N), a contradiction, so c > rf. Combined, this yields ac > b > a and 
bd > c> d. This leads to r(N) = ac® bd, T{K) = abd, T{X) = be, and T{V) = acd. Of the three 
possibilities for an SP extension with minimal makespan, we analyse K (symmetric to V); X is similar. 

Since K has minimal makespan among SP extensions, adb < be and adb < acd, so ad < c and 
b < c. Hence abd < cc, so aaabbbddd < abccccd. Therefore either bbbddd < acccc or aaa < bd. 
In the first case, aaabbbddd < aaaacccc, and in the second case, aaabbbddd < bbbbdddd. In either 
event, aaabbbddd < aaaacccc © bbbbdddd. However, abd is just T{K) and ac ® bd is just r(N), so 
r(if)/T(N) < 4/3. □ 

Each of the three minimal SP extensions of N with the workload t{a) — 1, t{b) — 2, t{c) = 2 and 
t{d) — 1 has the same makespan of 4, while T(N, t) — 3, so the slowdown in this case is at least 4/3. This 
shows that if the workload-dependent bounded slowdown conjecture holds, then the 4/3 bound is tight. 

Theorem 3 is independent of specific workloads. This is also the case for the next theorem. The 
five-activity case requires case analysis but it is done by contradiction rather than by the direct method 
used in the four-activity case, and it also uses max-plus algebra. Some additional remarks are necessary. 

Directed acyclic graphs can be decomposed into modules [14]. When the edges form a transitive 
relation, modules have either series or parallel structure, or cannot be further decomposed. Modular 
decomposition for activity networks can then be thought of as an extension of the SP grammar in 
Section 4 by adding a terminal M representing those networks that cannot be further decomposed in 
series or in parallel. Such indecomposable networks include the N network and ns{d, w, 3) for d > 3 and 
n > 3. 

Theorem 4. Let be an aetivity network with five activities and workload t then there exists an SP 
extension G|p of such that T(G%p,t) /T{G^ ,t) < 4/3. 

Proof. There are 16 non-isomorphic non-SP activity networks with five activities. An activity network 
and its dual"* have the same slowdown results and we need only consider 9 activity networks. Six of 
these can be analysed using decomposition which yields an SP network with unit slowdown, together 
with an N network to which Theorem 3 can be applied, and the two slowdowns can then be combined 
[16, Theorem 5.12]. For the three remaining indecomposable networks, the minimal SP extensions are 
identified, and each case is checked using arguments similar to those in the proof of Theorem 3, yielding 
sets of inequalities which each lead to a contradiction if slowdown greater than 4/3 is assumed. □ 

8 Programmatic approach 

The six-activity case has been checked using an approach that is now described. Our implementation 
also verified the proofs for 4 and 5 activities. 

For a fixed number of activities n, we want to consider all non-SP networks with n activities, and for 
each of these, to show that for every possible workload there is a SP extension which achieves the 4/3 
bound. Working inductively, for networks with fewer than n activities we have already shown the 4/3 
bound. We also only need to consider activity networks up to isomorphism. Additionally, we do not need 
to consider networks that can be decomposed such that there is at least one series or parallel node in the 
decomposition, since the slowdown is then bounded above by the slowdown of an activity network with 
less than n activities [16, Theorem 5.12]. Therefore we need only consider indecomposable networks and 
those which are decomposable but where every module is indecomposable. 

The overall schema is to consider each possible activity network G in turn, assuming that it is a 
counterexample. Each of its SP extensions then has slowdown exceeding 4/3. This generates a system 
of inequalities, and we can then demonstrate that this system has no solution. 

First all possible n-activity networks are generated and classified into SP, decomposable (but not 
SP), or indecomposable. Isomorphic activity networks are discarded, reducing the number of candidate 

*The dual of a directed graph is the graph with its edges reversed. 
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counterexamples. For some candidate G, consider each minimal extension H . Only considering minimal 
extensions is valid because any non-minimal extension H' will give T{H) < T{H'). Every SP extension H 
exceeds the 4/3 bound, so we require that 4T(G) < 3T{H) for each such extension. It is also necessary to 
consider some extensions that are not SP as these generate additional, necessary constraints. Specifically, 
we consider the decomposable extensions because they have slowdown of at most 4/3. If T{G) = T{H) 
for a decomposable extension H then by the inductive hypothesis we could find an SP extension of G 
that would meet the bound, hence we require that T{G) < T{H) for every decomposable extension H of 
G. 

We now need to ask which workloads can allow all these constraints to hold simultaneously. Since 
T{H) = max{r(C) | C is a chain in i?}, we can consider each possible chain as a critical path and 
generate additional constraints that T(C) > T{D) for all chains D in H . Hence we need to consider the 
disjunction of the sets of inequalities 

{T(G) < T{C)} U {4T(G) < 3T(C)} U {T{G) > T{D) \ D a chain in H,Dj^C} 

for every maximal chain C in H. We only need to consider maximal chains since non-maximal chains 
have lower makespan. The makespan of a chain is simply the sum of its activity durations, so each choice 
of critical path C generates a system of linear inequalities expressed with variables that represent the 
unknown activity durations. 

These inequalities can now be fed to a constraint solver such as clp(q) [9] to check if a workload 
does exist that meets the constraints. If one is found then we have found a counter-example to the 4/3 
conjecture. For 4, 5, and 6 activities, an exhaustive search showed that no counterexamples exist. 

We have proved formally that the 4/3 bound holds for the four-activity and five-activity case, and 
we have a programmatic proof of the six-activity case. This provides some evidence that Conjecture 2 is 
true. The techniques used for smaller indecomposable networks can be applied to the seven-activity case 
also. However, the systems of inequalities are too large to handle with the tools currently used, so such 
a proof would require new techniques or tools. 

9 Conclusions and further work 

Series-parallelising an activity network is done implicitly when a program is expressed in an inherently 
series-parallel formalism, or explicitly for the purposes of aiding scheduling. We now consider the impli- 
cations of the bound for LC slowdown, the disproof of the factor of 2 conjecture, and the new factor of 
4/3 conjecture. 

As shown in Section 5, LC slowdown is bounded above. If all activities have very similar durations, a 
good bound is obtained and LC extensions are useful. However, this bound is not necessarily tight when 
durations vary. 

In the motivating example, deciding which series-parallelisation to use at the time of writing the 
program forces a particular series-parallelisation before the workload is known. Consider a parallel 
programming environment that only allows SP activity networks to be expressed. At the time of writing, 
MATLAB is one such environment and we believe that in practice both Mathematica and OpenCL also 
require activity networks to be SP'"'. 

Theorem 2 shows that requiring the series-parallelisation to be chosen before the workload is known 
accurately, may result in slowdown of more than 2. Iterating the construction for larger NS networks (of 
greater width as well as depth) allows the slowdown to be forced to be arbitrarily large. 

Neighbour-synchronised networks are common in practice and may be quite large. The workload in 
practice may be different to what was expected when writing the program; for instance, contention for 
shared resources, communication delays, and cache misses are just some of the stochastic effects that affect 

^ Mathematica and OpenCL both provide SP constructs, as well as more general methods to specify synchronization 
between activities; unfortunately these require creating objects for each precedence constraint. Such a heavy-weight mech- 
anism only makes sense if activities are all very large (for instance, if the program consists of just a few threads), or there 
are only few precedence constraints. 
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parallel computation and that may produce large variations in the duration of an activity. Therefore, 
choosing a series-parallelisation without taking into account possible variations in workload may lead to 
large slowdown. 

If one postpones the decision, it may be possible to do automated analysis at compile time, or the 
scheduler may be able to work around any locally arising bottlenecks due to stochastic variation in activity 
durations. Hence it would seem to be worthwhile allowing sufhcient expressivity in the language so that 
one can more closely approximate the activity network of a computation. 

On a positive note, if one can find a series-parallelisation that gives one the conjectured 4/3 bound, 
then the impact of adding constraints is limited - the program will only take one-third as long again as 
it would have taken without the additional constraints and this seems a reasonable penalty to pay to 
obtain a structure that makes many scheduling problems easier. 

However, one needs to take into account the cost of finding a series-parallelisation that achieves the 
bound. Consider the optimisation problem 



MINIMUM SERIES-PARALLELISATION (MSP) 
Input: poset G, workload t: V{G) (0, oo) 
Output: poset H, H \s& SPE of G 
Criterion: minimise T{H). 



Let I a; I denote the size of an instance x of MSP. It is easy to show that MSP is in the complexity class 
NPO [2]. Computing the level-constrained extension of an activity network can be done in polynomial 
time as discussed in Section 4. The approximation ratio of this procedure is bounded by 2^^^^^ \ MSP 
is therefore in the class exp-APX, which is strictly contained in NPO unless P = NP [2]. 

Conjecture 2 implies that MSP can be approximated within a factor of 4/3, but there is not necessarily 
a polynomial-time algorithm that can achieve this. A branch-and-bound algorithm for solving MSP never 
needs to consider more than 2'^^l^l ^ possible extensions, each corresponding to a subset of edges. 

So a polynomial-time algorithm achieves slowdown of at most 2'-'(l^' \ On the other hand, an SP 
extension with minimal slowdown can be found in 2'-^^l^l ^ time, and Conjecture 2 would bound this 
slowdown as being at most 4/3. It is not clear how to close this gap; it appears possible that MSP is 
exp-APX-hard. 

MSP also seems related to the classical decision problem MINIMUM PRECEDENCE CONSTRAINED 
SCHEDULING (MPCS) [6], which is NP-complete. The difficulty of MPCS derives from there being only 
a limited number of processors. In contrast, MSP appears to be difficult because the output network must 
be series-parallel. The (4/3 — e)-inapproximability of MPCS [8] suggests that a similar inapproximability 
result may exist for MSP. 

Several directions for future work are envisaged. The first relates to the proof of Conjecture 2, at 
least for 7 activities. This requires improving the implementation so its correctness could be verified and 
finding more powerful techniques that avoid case analysis. Second, a programming construct to specify 
NS networks could be added to existing programming environments and its performance established. 
Finally, if the decision version of MSP could be shown to be NP-complete, perhaps by reduction from 
MPCS, then the NP-hardness of MSP would follow. Proving that MSP is exp-APX-hard is another goal. 
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